Pertussis toxin gene: cloning and expression of protective antigen

ABSTRACT

The complete nucleotide sequence of the pertussis toxin gene and the deduced amino acid sequences of the individual subunits have been determined. All five subunits are coded by closely linked cistrons and possibly expressed through a polycistronic mRNA, since a promotor-like structure was found in the 5&#39; flanking region. The order of the cistrons is S1, S2, S3, S4, S5, and S3. All subunits contain signal peptides of variable length. The calculated molecular weights of the mature subunits are 25,024 for S1, 21,924 for S2, 21,873 for S3, 12,058 for S4 and 11,013 for S5. All subunits contain signal peptides of variable length. Subunits S2 and S3 share 70% amino acid homology and 75% nucleotide homology. Subunit S1 contains two regions of eight amino acids homologous to analogous regions in the A subunit of both cholera and E. coli heat labile toxins. Functional domains in relation to the primary structure and the development of a safer, new generation vaccine against whooping cough are presented.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention is related to molecular cloning of pertussis toxingenes capable of expressing an antigen peptide protective againstpertussis. More particularly, the present invention is related to abacterial plasmid pPTX42 encoding pertussis toxin.

2. State of The Art

Pertussis toxin is one of the various toxic components produced byvirulent Bordetella pertussis, the microorganism that causes whoopingcough. A wide variety of biological activities such as histaminesensitization, insulin secretion, lymphocytosis promoting andimmunopotentiating effects can be attributed to this toxin. In additionto these activities, the toxin provides protection to mice whenchallenged intracerebrally or by aerosol. Pertussis toxin is, therefore,an important constituent in the vaccine against whooping cough and isincluded as a component in such vaccines.

However, while this toxin is one of the major protective antigensagainst whooping cough, it is also associated with a variety ofpathophysiological activities and is believed to be the major cause ofharmful side effects associated with the present pertussis vaccine. Inmost recipients these side effects are limited to local reactions, butin rare cases neurological damage and death does occur (Baraff et al.,1979 in Third International Symposium on Pertussis. U.S. HEW publicationNo. NIH-79-1830). Thus a need to produce a new generation of vaccineagainst whooping cough is evident.

SUMMARY OF THE INVENTION

It is, therefore, an object of the present invention to clone thegene(s) responsible for expression of pertussis toxin.

It is a further object of the present invention to isolate at least apart of the pertussis toxin genome and determine the nucleotide sequenceand genetic organization thereof.

It is yet another object of the present invention to characterize thetoxin polypeptide encoded by the cloned gene(s), at least in terms ofthe aminoacid sequence thereof.

Other objects and advantages of the present invention will becomeevident upon a reading of the detailed description of the inventionpresented herein.

BRIEF DESCRIPTION OF DRAWINGS

These and other objects, features and many of the attendant advantagesof the invention will be better understood upon a reading of thefollowing detailed description when considered in connection with theaccompanying drawings wherein:

FIG. 1 shows SDS-electrophoresis of the products of HPLC separation ofpertussis toxin. Lanes 1 and 12 contain 5 μg and 10 μg, respectively, ofunfractionated pertussis toxin. Lanes 2 through 11 contain 100 μlaliquots of elution fractions 19 through 28, respectively. The molecularweights of the subunits are indicated;

FIG. 2 shows restriction map of the cloned 4.5 kb EcoRI/BamHI B.pertussis DNA fragment and genomic DNA in the region of the pertussistoxin subunit gene. (a) Restriction map of a 26 kb region of B.pertussis genomic DNA containing pertussis toxin genes. (b) Restrictionmap of the 4.5 kb EcoRI/BamHI insert from pPTX42. The arrow indicatesthe start and translation direction of the mature toxin subunit. Thelocation of the Tn5 DNA insertion in mutant strains BP356 and BP357 isshown. (c) PstI fragment derived from the insert shown in panel b;

FIG. 3 shows Southern blot analysis of B. pertussis genomic DNA withcloned DNA probes. (a) Total genomic DNA from strain 3779 was digestedwith various restriction enzymes as indicated on the figure, andanalyzed by Southern blot using nick translated PstI fragment B ofpPTX42 (see FIG. 2c). (b) Between 24 μg and 60 μg of genomic DNA fromstrains 3779, Sakairi (pertussis toxin⁻, Tn5⁻), BP347 (non-virulent,Tn5⁺), BP349 (hemolysin⁻, Tn5⁺), BP353 (filamentous hemagglutinin⁻,Tn5⁺), Bp356 and BP357 (both pertussis toxin⁻, Tn5⁺) (15) (lanes 1through 7, respectively) were digested with PstI and analyzed bySouthern blot using nick translated PstI fragment B as the probe. (c)The same as panel b excet PstI fragment C was used as the probe;

FIG. 4 shows the physical map and genetic organization of the PertussisToxin Gene. (a) Restriction map of the 4.5 kb EcoRI/BamHI fragment frompPTX42 containing the pertussis toxin gene cloned from B. pertussisstrain 3779 (12). The arrow indicates the position of the Tn5 DNAinsertion in pertussis toxin negative Tn5-induced mutant strains BP356and BP357 (24). (b) Open reading frames in the forward direction. (c)Open reading frames in the backward direction. The vertical linesindicate termination codons. (d) Organizational map of the pertussistoxin gene. The arrows show the translational direction and length ofthe protein coding regions for the individual subunits. The hatchedboxes represent the signal peptides. The solid bars in S1 represent theregions homologous to the A subunits in cholera and E. coli heat labiletoxins; and

FIG. 5 shows the physical map of the pertussis toxin S4 subunit gene.(a) Restriction map of the 4.5 kilobase pair (kb) EcoRI/BamHI fragmentinserted into pMC1403. (b) Detailed restriction map and sequencingstrategy of the PstI fragment B containing the S4 subunit gene. Only therestriction sites used for subcloning prior to sequencing are shown.Closed circled arrows show the sequencing strategy using dideoxy chaintermination and open circled arrows show the sequencing strategy usingbase-specific chemical cleavage. The arrows show the direction and thelength of the sequence determination. The heavy black line representsthe S4 coding region. (c) Open reading frames in the three forwarddirections. (d) Open reading frames in the three backward directions.The vertical lines indicate termination codons.

DETAILED DESCRIPTION OF INVENTION

The above objects and advantages of the present invention are achievedby molecular cloning of pertussis toxin genes. The cloning of the geneprovides means for genetic manipulation thereof and for producing newgeneration of substantially pure and isolated form of antigenic peptides(toxins) for the synthesis of new generation of vaccine againstpertussis. Of course, such manipulation of the pertussis toxin gene andthe creation of new, manipulated toxins retaining antigenicity againstpertussis but being devoid of undesirable side effects was notheretofore possible. The present invention is the first to clone thepertussis toxin gene in an expression vector, to map its nucleotidesequence and to disclose the finger print of the polypeptide encoded bysaid gene(s).

Any vector wherein the gene can be cloned by recombination of geneticmaterial and which will express the cloned gene can be used, such asbacterial (e.g. λgt11), yeast (e.g. PGPD-1), Viral (e.g. PGS20 or pMM4),and the like. A preferred vector is the microorganism, E. coli whereinthe pertussis gene has been cloned in the plasmid thereof.

Although any similar or equivalent methods and materials could be usedin the practice or testing of the present invention, the preferredmethods and materials are now described. All scientific and/or technicalterms used herein have the same meaning as generally understood by oneof ordinary skill in the art to which the invention belongs. Allreferences cited hereunder are incorporated herein by reference.

MATERIALS AND METHODS

Materials. Restriction enzymes were purchased from Bethesda ResearchLaboratories (BRL) or International Biotechnologies, Inc. and used underconditions recommended by the suppliers. T4 DNA ligase, M13mp19 RFvector, isopropylthio-β-galactoside (IPTG),5-bromo-4-chloro-3-indolyl-β- D- galactoside (X-Gal), the 17-bpuniversal primer, Klenow fragment (Lyphozyme®) and T₄ polynucleotidekinase were purchased from BRL. Calf intestine phosphatase was obtainedfrom Boehringer Mannheim, nucleotides from PL-Biochemicals and basemodifying chemicals from Kodak (dimethylsulfate, hydrazine andpiperidine) and EM Science (formic acid). Plasmid pMC1403 and E. colistrain JM101 (supE, thi, Δ(lac-proAB), [F', traD36, proAB, lacI Z ΔM15])were obtained from Dr. Francis Nano (Rocky Mountain Laboratories,Hamilton, Montana 59840) Elutip-d® columns came from Schleicher andSchuell and low melting point agarose (LMP Agarose®) from BRL.Radiochemicals were supplied by ICN Radiochemicals (crude [γ-³² P]ATP,7000 Ci/mmol) and NEN Research Products ([α-³² P]dGTP, 800 Ci/mmole). B.pertussis strain 3779 was obtained from Dr. John J. Munoz, RockyMountain Laboratories, Hamilton, Mont. 59840. This strain is also knownas 3779 BL2S4 and is commonly available.

Purification of Pertussis Toxin Subunits: Pertussis toxin from B.Pertussis strain 3779 was prepared by the method of Munoz et al, CellImmunol. 83:92-100, 1984. Five mg of the toxin was resuspended intrifluoroacetic acid and fractionated by high pressure liquidchromatography, HPLC, using a 1×25 cm Vydac C-4 preparative column. Thesample was injected in 50% trifluoroacetic acid and eluted at 4 ml/minover 30 min with a linear gradient of 25% to 100% acetonitrile solutioncontaining 66% acetonitrile and 33% isopropyl alcohol. All solutionscontained 0.1% trifluoroacetic acid. Elution was monitored at 220 nm andtwo ml fractions collected. Aliquots of selected fractions were dried byevaporation, resuspended in gel loading buffer containing2-mercaptoethanol and analyzed by sodium dodecylsulphate polyacrylamidegel electrophoresis, SDS-PAGE, on a 12% gel.

Protein and DNA Sequencing: The polypeptide from HPLC fraction 21 (FIG.1, lane 4) was sequenced using a Beckman 890C automated proteinsequenator according to the methods described by Howard et al, Mol.Biochem. Parasit. 12:237-246, 1984. DNA was sequenced from the SmaI site(see FIG. 2b) by the Maxam and Gilbert technique as described in Methodsin Enzymol. 65:499-560, 1980.

Isolation of Pertussis Toxin Genes: Chromosomal DNA was prepared from B.pertussis strain 3779 following the procedure described by Hull et al.,Infec. Immunol. 33:933, 1981. The DNA was digested with bothendonucleases EcoRI and BamHI and ligated into the same sites in thepolylinker of pMC1403 as described by Casadaban et al. J. Bacteriol.143:971-980, 1983; Maniatis et al, Molecular Cloning: A LaboratoryManual, 1982. The conditions for ligation were: 60 ng of vector DNA and40 ng of insert DNA incubated with 1.5 units of T4 DNA ligase (BRL) and1 mM ATP at 15° C. for 20h. E. coli JM109 cells were transformed withthe recombinant plasmid in accordance with the procedure of Hanahan, J.Mol. Biol. 166:557-580, 1983 and clones containing the toxin geneidentified by colony hybridization at 37° C. using a ³² P-labeled17-base mixed oligonucleotide probe 21D3 following the procedure ofWoods, Focus 6:1-3, 1984. The probe was synthesized on a SAM-1 DNAsynthesizer (Biosearch, San Rafael, Calif.) and consisted of the 32possible oligonucleotides coding for 6 consecutive amino acids of thepertussis toxin subunit (Table 1). The probe was purified from a 20%urea-acrylamide gel and 5'-end labeled using 0.2 mCi of (gamma³² P)ATP(ICN, crude, 7000 Ci/mmol) and 1 unit of T₄ polynucleotide kinase (BRL)per 10 μl of reaction mixture in 50 mM Tris-HCl (pH 7.4) 5 mM DTT, 10 mMMgCl₂. The labeled oligonucleotides were purified by binding to aDEAE-cellulose column (DE52, Whatman) in 10 mM Tris-HCl (pH 7.4), 1 mMEDTA (TE) and eluted with 1.0 M NaCl in TE. Ten positive clones wereisolated and purified. Plasmid DNA from these clones were extractedaccording to the procedure of Maniatis et al, Molecular Cloning: ALaboratory Manual, 1982, digested with routine restriction endonucleases(BRL), and then analyzed by 0.8% agarose gel electrophoresis in TBE (10mM Trisborate pH 8.0, 1 mM EDTA). Southern blot analysis using the ³²P-labeled oligonucleotide 21D3 as the probe showed that all 10 clonescontained an identical insert of B. pertussis DNA. One clone was usedfor further analysis by Southern blots (FIG. 3) and for DNA sequencing.

Southern Blot Analyses: Extracted DNA as described supra, was digestedand separated by electrophoresis using either 0.7% or 1.2% agarose gelsin 40 mM Tris-acetate pH 8.3, 1 mM EDTA for 17 h at 30 V. The DNA wasthen blotted onto nitrocellulose in 20X SSPE, sodium chloride, sodiumphosphate EDTA buffer, pH 7.4, in accordance with Maniatis et al.,supra, and baked at 80° C. in a vacuum oven for 2 h. Filters wereprehybridized at 68° C. for 4 h in 6X SSPE, 0.5% SDS, 5X modifiedDenhardt's (0.1% Ficoll 400, 0.1% bovine serum albumin, 0.1%polyvinylpyrrolidone and 0.3X SSPE) and 100 μg/ml denatured herringsperm DNA. The hybridization buffer was the same as the prehybridizationbuffer, except EDTA was added to a final concentration of 10 mM. PstIfragments A, B, C and D were isolated by 0.8% low-melting point agarosegel electrophoresis, purified on Elutip-d columns (Schleicher andSchuell) and nick translated (BRL) using (alpha³² P)CTP (800 Ci/mmol,NEN Research Products). The nick translated probes were hybridized at aconcentration of about 1 μCi/ml for 48 h at 68° C. Filters were thenwashed in 2X SSPE and 0.5% SDS at room (22°-25° C.) temperature for 5min, then in 2X SSPE and 0.1% SDS at room temperature for 15 min, andfinally in 0.1X SSPE and 0.5% SDS at 68° C. for 2 h. The washed filterswere air dried and exposed to X-ray film using a Lightning-Plusintensifying screen following standard techniques.

Isolation and cloning of S4 subunit gene: As mentioned above, purifiedpertussis toxin from B. pertussis strain 3779 was fractionated by highpressure liquid chromatography (HPLC). One fraction (Fr21) contained apolypeptide which comigrated as a major band with subunit S4 on SDS-PAGE(FIG. 1, lane 4). Although complete separation was not achieved, themajor portion of the other toxin subunits were recovered in other HPLCfractions, i.e., S2 in Fr22, S1 and S5 in Fr23, and S3 in Fr24 (FIG. 1).The amino acid sequence of the first 30 NH₂ -terminal residues of theprotein in fraction 21 was determined and is shown in Table 1.

                                      TABLE 1                                     __________________________________________________________________________    Protein and DNA Sequences of Pertussis Toxin Subunit, Oligonucleotide         Probe and Homologous Genomic DNA Clone                                        __________________________________________________________________________    DNA sequence: Predicted amino acid sequence:                                                  ##STR1##                                                                      ##STR2##                                                      Mature protein sequence:                                                                      ##STR3##                                                                      ##STR4##                                                      __________________________________________________________________________     The S4 H.sub.2 Nterminal amino acid sequence determined using the             automated protein sequenator is shown in blocks as the mature protein         sequence. Residues that were questionable in the sequence are indicated b     brackets. The DNA and predicted amino acid sequences are shown. Possible      initiation codons are indicated by fMet. A putative proteolytic cleavage      site is indicated by *. The oligonucleotide probe sequence is shown in th     block labeled probe 21D3. The abbreviations used are: P = G or A; Y = T o     C; N = A, C, G or T.                                                     

Based on the protein sequence shown in Table 1, a mixed oligonucleotideprobe representing a region of six consecutive amino acids with theleast redundancy of the genetic code was synthesized. In this mixture ofoligonucleotides, identified as probe 21D3, approximately 1 out of 32molecules corresponds to the actual DNA sequence of the pertussis toxingene (Table 1). This mixed oligonucleotide probe was used to screen aDNA clone bank containing restriction fragments of total pertussischromosomal DNA. The clone bank was prepared by digesting genomic DNAisolated from B. pertussis strain 3779 with both EcoRI and BamHIrestriction endonucleases. The complete population of restrictionfragments was ligated into the EcoRI/BamHI restriction site ofexpression vector pMC1403 and the recombinant plasmid used to transformE. coli JM109 cells following standard procedures well known in the art.It is noted that although E. coli is the preferred organism, othercloning vectors well known in the art, could, of course, bealternatively used.

Approximately 20,000 colonies were screened by colony hybridizationusing the ³² P-end labeled oligonucleotide probe 21D3. The plasmid DNAof 10 positive colonies was examined by restriction enzyme and Southernblot analyses. All 10 colonies contained a recombinant plasmid with anidentical 4.5 kb EcoRI/BamHI pertussis DNA insert. One of these clones,identified as pPTX42, was selected for further characterization. Arestriction map of the insert DNA was prepared and is shown in FIG. 2b;Southern blot analysis indicated that the oligonucleotide probe 21D3hybridized to only the 0.8 kb SmaI/PstI fragment.

A deposit of said pPTX42 clone has been made in American Type CultureCollection, Rockville, MD under the accession No. 67046. This culturewill continue to be maintained for at least 30 years after a patentissues and will be available to the public without restriction, ofcourse, in accordance with the provisions of the law.

Sequencing of the H₂ N-terminal region for S4: The 0.8 kb fragment wasisolated by agarose gel electrophoresis and sequenced using the Maxamand Gilbert technique, supra. The DNA sequence was translated into anamino acid sequence and a portion of that sequence is compared in Table1 to the NH₂ -terminal 30 amino acids of the pertussis toxin subunit andthe oligonucleotide probe 21D3 sequence. Out of the sequence of 30 aminoacid residues determined using the automated sequenator, only 2 do notcorrespond to the amino acid sequence deduced from the DNA sequencei.e., residues 24 and 26 are questionable because they repeat the aminoacid in front of them and they are located near the end of the analyzedsequence. Amino acid 15 could not be determined. The rest of the deducedamino acid sequence perfectly matches the original protein sequence. Theoligonucleotide probe sequence also perfectly matches the cloned DNAsequence. These results indicate that at least one of the pertussistoxin subunit genes has been cloned.

Examination of the DNA sequence indicates that a precursor protein,perhaps containing a leader sequence may exist (Table 1). In fact, theNH₂ -terminal aspartic acid of the mature protein is not immediatelypreceded by one of the known initiation codons, i.e., ATG, GTG, TTG, orATT, but by GCC coding for alanine, an amino acid that often occurs atthe cleavage site of a signal peptide. A proline is found at amino acidposition -4, which is also consistent with cleavage sites in other knownsequences where this amino acid is usually present within six residuesof the cleavage site. Possible translation initiation sites in the samereading frame as the mature protein and upstream of the NH₂ -terminalaspartic acid are: ATG at position -9, TTG at -15, and GTG at -21;however, none of these are preceded by a Shine/Dalgarno ribosomalbinding site (Nature., London, 254:34-38, 1975) and only GTG at -21 isimmediately followed by a basic amino acid (arginine) preceding ahydrophobic region, characteristic of bacterial signal sequences. Usingthe DNA sequence data and primer extension to sequence the mRNA, theactual initiation site could also be determined.

Physical mapping of the S4 gene on the bacterial chromosome: The 1.3 kbPstI fragment B containing at least part of the pertussis toxin gene wasused as a probe to physically map the location of this gene on the B.pertussis genome (FIG. 2). FIG. 3a shows a Southern blot analysis oftotal B. pertussis DNA digested with a variety of six base pair-specificrestriction enzymes and probed with the 1.3 kb PstI fragment B isolatedfrom pPTX42. Each restriction digest yielded only one DNA band whichhybridized with the probe. Since the 1.3 kb PstI fragment B contains aSmaI site, two bands would be expected from a SmaI digest of genomic DNAunless the SmaI fragments were similar in size. Further analysisindicated that the single band seen in the SmaI digest is actually adoublet of two similar size DNA fragments. In this particular gel,fragments of 1.3 kb and smaller migrated off the gel duringelectrophoresis and thus could not be detected, however, in otherSouthern blots in which no fragment was run off the gel, only one bandwas found for each restriction enzyme. These results indicate that thegene encoded by the PstI fragment B occurs only once in the genome.Using the data from these experiments and similar studies using the 1.5kb PstI fragment A and the 0.7 kb PstI/BamHI fragment D from the cloned4.5 kb EcoRI/BamHI fragment, a partial restriction map of a 26 kb regionof the pertussis genome as shown in FIG. 2a was obtained. This methodallowed to locate the first restriction site of a particularendonuclease on either side of the 4.5 kb EcoRI/BamHI fragment. Thisinformation i useful in deciphering the genetic arrangement of the toxingenes and for the cloning of larger DNA fragments of pertussis toxin.

Relationship of the S4 gene and Tn5-insertions: Weiss et al, Infect.Immun. 42:33-41, 1983, have developed several important Tn5-induced B.pertussis mutants deficient in different virulence factors, i.e.,pertussis toxin, hemolysin, and filamentous hemagglutinin (Infect.Immun. 43:263-269, 1984; J. Bacteriol. 153:304-309, 1983). Toinvestigate the physical relationship between the Tn5 DNA insertion andthe pertussis toxin subunit gene, genomic DNA from these mutants andstrain 3779 by Southern blots using various restriction fragments of thecloned 4.5 kb EcoRI/BamHI DNA fragment as probes were analyzed. In oneset of experiments, blots of genomic PstI fragments were separatelyprobed with cloned PstI fragments A, B, C, and D (FIG. 2c). The PstIfragments from the mutants and strain 3779 which hybridized with thecloned PstI fragments A, B, and D were exactly the same size; the blotprobed with PstI fragment B is shown in FIG. 3b. However, when the PstIfragment C was used as a probe, the genomic DNA from mutant strainsBP356 and BP357 showed a clear difference in the size of the PstIfragments that hybridized as compared to strain 3779 and the othermutant strains (FIG. 3c, lanes 6 and 7). These results indicate thatthis fragment contains the site of the Tn5 insertion. As expected, twolabeled fragments were found, since the Tn5 DNA insert has twosymmetrical PstI sites. Other Southern blots (not shown) in whichgenomic BglII and SmaI fragments were hybridized with the 4.5 kbEcoRI/BamHI cloned probe, and the data from FIG. 3c, clearly show thatthe Tn5 DNA was inserted 1.3 kb downstream from the start of the maturepertussis toxin S4 subunit in the two mutant strains that werecharacterized as pertussis toxin negative phenotypes, i.e., BP356 andBP357 (FIG. 2b). This insertion is beyond the termination codon for theS4 subunit (11.7 kD). Examination of these toxin negative mutants byWestern blots using monoclonal antibodies for individual subunitsindicate that the Tn5 DNA is not inserted in the subunit structuralgenes for S1 or S2 (unpublished results). The pertussis toxin negativephenotype of strains BP356 and BP357 can be explained by either of twononexclusive mechanisms. The Tn5 DNA may be inserted into the codingregions of either S3, S5, or perhaps another gene required for toxinassembly or transport. Alternatively, the Tn5 insertion could disruptthe expression of essential downstream cistrons in a polycistronicoperon. Similar Southern blot analyses of genomic BamHI and EcoRIfragments indicate that none of the other virulence factor genesrepresented by the other Tn5-insertion mutants, are located within the17kb region defined by the first BamHI and the second EcoRI sites asshown in FIG. 2a.

Nucleotide Sequence

Having described the identification, isolation, and construction ofrecombinant plasmid pPTX42, containing pertussis toxin genes, the insertDNA from this plasmid, i.e., the 4.5 kb EcoRI/BamHI fragment shown inFIG. 4a, was digested with various restriction enzymes and subcloned bystandard procedures (Maniatis et al., supra) using the cloning vectorsM13 mp18 and M13 mp19 and E. coli strain JM1O1 as described by Messing,Methods Enzymol. 101:20-78, 1983. Both strands of the DNA were sequencedusing either the Maxam and Gilbert base-specific chemical cleavagemethod, supra, or the dideoxy chain termination method of Sanger et al.,PNAS, 74:5463-5467, 1977, with the universal 17-base primer, or both.The DNA sequence and the derived amino acid sequence were analyzed usingMicroGenie® computer software.

Because of the high C+G content of B. pertussis DNA, it was necessary touse both of the above mentioned methods with a combination of 8% and 20%polyacrylamide-8 M urea gels for sequence analysis. Each nucleotide hasbeen sequenced in both directions an average of 4.13 times. The finalconsensus sequence of the sense strand is shown in Table 2. It is notedthat the sequence of the S4 subunit gene has been included in this tablefor completeness since this sequence lies in the middle of thestructural gene sequence presented in Table 2. The entire sequencecontains about 62.2% C+G with about 19.6% A, 33.8% C, 28.4% G and 18.2%T in the sense strand, wherein A, T, C and G represent the nucleotidesadenine, thymine, cytosine and guanine, respectively.

                                      TABLE 2                                     __________________________________________________________________________    Complete Nucleotide Sequence of Pertussis Toxin Gene                          __________________________________________________________________________     ##STR5##                                                                      ##STR6##                                                                      ##STR7##                                                                      ##STR8##                                                                      ##STR9##                                                                      ##STR10##                                                                     ##STR11##                                                                     ##STR12##                                                                     ##STR13##                                                                     ##STR14##                                                                     ##STR15##                                                                     ##STR16##                                                                     ##STR17##                                                                     ##STR18##                                                                     ##STR19##                                                                     ##STR20##                                                                     ##STR21##                                                                     ##STR22##                                                                     ##STR23##                                                                     ##STR24##                                                                     ##STR25##                                                                     ##STR26##                                                                     ##STR27##                                                                     ##STR28##                                                                     ##STR29##                                                                     ##STR30##                                                                     ##STR31##                                                                     ##STR32##                                                                     ##STR33##                                                                     ##STR34##                                                                     ##STR35##                                                                     ##STR36##                                                                     ##STR37##                                                                     ##STR38##                                                                     ##STR39##                                                                     ##STR40##                                                                     ##STR41##                                                                     ##STR42##                                                                     ##STR43##                                                                     ##STR44##                                                                     ##STR45##                                                                     ##STR46##                                                                     ##STR47##                                                                     ##STR48##                                                                     ##STR49##                                                                     ##STR50##                                                                     ##STR51##                                                                     ##STR52##                                                                     ##STR53##                                                                     ##STR54##                                                                     ##STR55##                                                                     ##STR56##                                                                     ##STR57##                                                                     ##STR58##                                                                     ##STR59##                                                                     ##STR60##                                                                     ##STR61##                                                                     ##STR62##                                                                     ##STR63##                                                                     ##STR64##                                                                     ##STR65##                                                                     ##STR66##                                                                     ##STR67##                                                                     ##STR68##                                                                     ##STR69##                                                                     ##STR70##                                                                     ##STR71##                                                                     ##STR72##                                                                     ##STR73##                                                                     ##STR74##                                                                     ##STR75##                                                                    __________________________________________________________________________     The deduced amino acid sequences of the individual subunits are shown in      the single letter code below the nucleotide sequence. The proposed signal     peptide cleavage sites are indicated by asterisks. The start of the           protein coding region for each subunit is indicated by the box and arrow      over the initiation codon. Putative ribosomal binding sites are               underlined. The promoterlike sequence is shown in the -35 and -10 boxes.      Proposed transcriptional start site is indicated by the arrow in the CAT      box. Inverted repeats are indicated by the arrows in the flanking regions                                                                              

Assignment of the subunit cistrons

The DNA sequence shown in Table 2 was translated in all six readingframes and the reading frames are shown in FIG. 4b,c. The open readingframe (ORF) corresponding to the S4 subunit was identified and is shownin FIG. 4d. The assignment of the other subunits to their respectiveORFs is based on the following lines of evidence: size of ORFs, highcoding probability, deduced amino acid composition, predicted molecularweights, ratios of acidic to basic amino acids, amino acid homology toother bacterial toxins, mapping of Tn5-induced mutations, and partialamino acid sequence.

Significant ORFs, long enough to code for any of the five toxinsubunits, were analyzed by the statistical TESTCODE algorithm designedto differentiate between real protein coding sequences and fortuitousopen reading frames in accordance with Fickett, Nucleic Acids Res.10:5303, 1982. The amino acid composition of each ORF with a highprotein coding probability was calculated, starting from either thepredicted amino terminus of the mature proteins or from the first aminoacid for the mature protein determined by amino acid sequencing of HPLCpurified subunits. These data were then compared with theexperimentally-determined compositions of the individual subunits asdescribed by Tamura et al. Biochem. 21:5516, 1982. Based on thesimilarity of the amino acid compositions shown in Table 3, all fivesubunits were identified and assigned to the ORF regions shown in FIG.4d. Table 3 shows that the deduced amino acid composition from all fiveassigned subunits are in good agreement with theexperimentally-determined compositions of Tamura et al supra, with twosignificant exceptions. First, the S1 subunit contains no lysineresidues in the deduced amino acid sequence, whereas 2.2% lysine wasexperimentally determined. Second, in subunits S2, S3, S4, and 55 theproportion of cysteines were substantially underestimated in theexperimentally observed compositions. These discrepancies, as well asthe remaining minor differences observed for all subunits, including thepreviously assigned S4 subunit, can most reasonably be explained byexperimental error during amino acid analysis. Similar analyses, inwhich a DNA-deduced amino acid composition was compared with anexperimentally-derived amino acid composition show the same minordifferences. The absence of lysine residues in S1 may explain whylysine-specific chemical modification does not affect the biological andenzymatic activities of S1. The amino acid composition of the ORFs (FIG.4b,c) not assigned to any subunit show no similarity to any of theexperimentally-determined amino acid compositions, although some ofthese ORFs are quite long and have a high coding potential. It ispossible that these regions code for other proteins, perhaps involved inthe assembly or transport of pertussin toxin.

The experimentally-estimated molecular weight and isoelectric point ofthe individual subunits were compared to the calculated molecular weightand ratio of acidic to basic amino acids of the putative proteinsencoded by the ORFs shown in FIG. 4. As expected for this comparison,Table 3 shows that differences in the ratios reflect correspondingdifferences in the observed isoelectric points for each subunit, i.e.,the higher the acidic content, the lower the isoelectric point. Thecomparison of the molecular weights also shows good correspondence tothe experimentally-determined values, with slight differences for the S1(less than 10%) and the S5 (about 15%) subunits. These small differencesare within acceptable limits for protein molecular weights determined bySDS-PAGE.

                                      TABLE 3                                     __________________________________________________________________________    Comparison of the Observed Amino Acid Composition With the Calculated         Composition From DNA Sequence for Mature Pertussis Toxin Subunits                                                   S4                                      S1             S2          S3         Observed      S5                        Observed Calculated                                                                          Observed                                                                            Calculated                                                                          Observed                                                                           Calculated                                                                          values.sup.a                                                                          Calculated                                                                          Observed                                                                           Calculated           values.sup.a                                                                           values                                                                              values.sup.a                                                                        values                                                                              values.sup.a                                                                       values                                                                              Exp. 1                                                                            Exp. 2                                                                            values                                                                              values.sup.a                                                                       values               __________________________________________________________________________    Mr.sup.b                                                                          28 k 26.0 k                                                                              23 k  21.9 k                                                                              22 k 21.9 k                                                                              11.7 k                                                                            --  12.1 k                                                                              9.3                                                                                11.0 k               A/B.sup.c                                                                         --   1.3   --    0.89  --   0.83  --  --  0.65  --   1.4                  pI.sup.d                                                                          5.8  --    8.5   --    8.8  --    10.0                                                                              10.0                                                                              --    5.0  --                   Ala 10.6 11.5  6.5   6.0   11.7 11.1  9.4 9.8 8.2   9.8  9.0                  Arg 5.9  9.0   6.2   6.0   6.1  6.5   5.1 5.4 5.5   3.3  3.0                  Asn.sup.e                                                                         9.3  5.6   6.3   2.5   6.3  2.0   5.3 5.0 0.9   8.2  3.0                  Asp --   4.3   --    4.0   --   4.0   --  --  3.6   --   5.0                  Cys 1.0  0.9   1.3   3.0   1.1  3.0   0.9 0.7 3.6   1.6  4.0                  Gln.sup.f                                                                         10.6 3.0   8.7   3.5   9.0  4.5   9.5 9.1 3.6   9.3  3.0                  Glu --   7.3   --    4.0   --   3.5   --  --  4.5   --   6.0                  Gly 11.2 7.7   13.0  10.6  11.9 10.1  9.6 8.9 6.4   8.7  8.0                  His 1.7  2.6   2.4   2.0   1.0  1.0   0.5 0.5 0.9   3.0  3.0                  Ile 3.2  3.4   4.2   5.5   5.0  6.5   2.0 1.8 1.8   3.4  3.0                  Leu 5.5  3.4   7.3   7.5   8.1  8.0   8.4 8.7 9.1   13.8 15.0                 Lys 2.2  0     3.4   3.0   2.7  2.5   6.9 7.6 7.3   4.7  5.0                  Met 1.6  1.7   1.4   1.5   1.1  1.5   5.1 4.3 7.3   1.6  2.0                  Phe 3.5  3.0   3.2   2.5   3.2  2.5   3.6 4.5 4.5   4.9  5.0                  Pro 4.4  3.4   4.6   4.5   5.7  5.0   9.1 9.9 10.0  5.6  5.0                  Ser 10.6 9.8   8.5   8.5   6.3  5.0   8.0 7.3 5.5   6.9  6.0                  Thr 7.4  7.3   10.4  10.1  8.2  8.0   5.0 5.1 4.5   6.9  7.0                  Trp ND.sup.g                                                                           0.9   ND    1.0   ND   0.5   ND  ND  0     ND   1.0                  Tyr 4.6  8.1   7.6   8.0   7.9  9.5   2.2 2.0 1.8   4.3  4.0                  Val 6.7  7.3   4.9   6.0   4.7  5.0   9.4 9.4 10.9  4.0  3.0                  __________________________________________________________________________     .sup.a Data from Tamava et, al. Biochem 21:5516, 1982.                        .sup.b Mr = molecular weight.                                                 .sup.c A/B = acid amino acids (Glu + Asp) ÷ basic amino acids (Arg +      Lys).                                                                         .sup.d pI = isoelectric pH.                                                   .sup.e Observed values are Asn + Asp.                                         .sup.f Observed values are Gln + Glu.                                         .sup.g ND = not determined                                               

                  TABLE 4                                                         ______________________________________                                        Comparison of Two Homologous Regions in ADP-ribosylating                      subunits of Pertussis, Cholera, and E. coli Heat Labile                       ______________________________________                                        Toxins.                                                                       Region 1                                                                      Pertussis S1 subunit                                                                      (8) Tyr Arg Tyr Asp Ser Arg Pro Pro (15)                          Cholera.sup.a A subunit                                                                   (6) Tyr Arg Ala Asp Ser Arg Pro Pro (13)                          E. coli.sup.a HLT A                                                                       (6) Tyr Arg Ala Asp Ser Arg Pro Pro (13)                          subunit                                                                       Region 2                                                                      Pertussis S1 subunit                                                                      (51) Val Ser Thr Ser Ser Ser Arg Arg (58)                         Cholera.sup.a A subunit                                                                   (60) Val Ser Thr Ser Ile Ser Leu Arg (67)                         E. coli.sup.a HLT A                                                                       (60) Val Ser Thr Ser Leu Ser Leu Arg (67)                         subunit                                                                       ______________________________________                                         The numbers in parentheses refer to the amino acid position in the mature     proteins.                                                                     .sup.a Data from Yamamoto, et al. FEBS Letter 169:241, 1983, HLT = Heat       Labile Toxin.                                                            

                                      TABLE 5                                     __________________________________________________________________________    Comparison of Codon Usage Between Pertussis Toxin and                         Strongly and Weakly Expressed E. coli Genes                                          Pertussis Toxin.sup.a                                                                            E. coli.sup.b                                                                             Pertussis Toxin.sup.a                                                                            E. coli.sup.b               S1 S2 S3 S4                                                                              S5 PTX.sup.c                                                                         S.sup.c                                                                          W.sup.c   S1 S2 S3 S4                                                                              S5 PTX.sup.c                                                                         S.sup.c                                                                           W.sup.c           __________________________________________________________________________    Ala                                                                              GCU 3  0  1  0 1  5   33 17 Lys                                                                              AAA 0  2  0  1 1  4   49  31                   GCC 17 7  14 9 4  52  9  34    AAG 0  5  7  7 4  24  20   8                   GCA 5  3  2  1 1  12  23 20 Met                                                                              AUG 4  3  4  9 2  22  27   25                  GCG 9  5  8  5 5  33  25 28 Phe                                                                              UUU 0  1  0  1 1  3   7    29               Arg                                                                              CGU 3  2  0  1 0  6   42 19    UUC 7  4  5  4 4  25  22   19                  CGC 12 7  9  4 0  33  19 25 Pro                                                                              CCU 1  1  0  1 0  3   4    6                   CGA 1  0  0  0 0  1   1  5     CCC 5  3  2  6 1  17  0.4  9                   CGG 5  3  1  2 2  13  0.2                                                                              8     CCA 0  1  2  0 0  3   5    9                   AGA 1  1  1  0 1  4   1  5     CCG 4  6  7  5 5  28  31   19                  AGG 3  1  3  0 0  7   0.2                                                                              3  Ser                                                                              UCU 0  1  0  0 0  1   18   7                Asn                                                                              AAU 4  2  0  1 1  8   2  19    UCC 7  6  3  2 4  23  17   9                   AAC 9  3  6  0 2  20  30 19    UCA 0  2  0  0 0  2   1    7                Asp                                                                              GAU 2  3  1  2 1  9   22 35    UCG 5  0  2  0 2  9   2    12                  GAC 8  6  7  2 5  29  39 20    ACU 0  0  0  1 0  1   2    11               Cys                                                                              UGU 0  0  0  0 0  0   2  6     AGC 12 10 5  5 3  36  9    12                  UGC 3  7  6  4 4  25  4  7  Thr                                                                              ACU 4  2  1  1 2  10  20   9                Gln                                                                              CAA 1  2  3  3 0  9   7  17    ACC 10 9  8  3 4  35  26   23                  CAG 7  5  7  1 3  24  32 32    ACA 3  1  1  0 0  5   3    6                Glu                                                                              GAA 10 5  5  5 3  29  63 40    ACG 6  9  7  2 2  27  5    15                  GAG 7  3  2  0 3  15  20 19 Trp                                                                              UGG 5  2  1  1 1  10  5    13               Gly                                                                              GGU 1  1  2  1 0  5   43 24 Tyr                                                                              UAU 8  6  8  2 3  28  6    18                  GGC 15 16 13 7 7  59  33 27    UAC 11 10 11 0 2  35  19   12                  GGA 3  4  3  0 2  12  1  8  Val                                                                              GUU 2  1  1  1 0  5   37   21                  GGG 0  1  3  0 0  4   3  13    GUC 10 7  6  6 3  33  8    13               His                                                                              CAU 3  4  1  1 2  11  4  18    GUA 3  1  2  1 0  7   23   9                   CAC 3  2  3  1 2  11  14 11    GUG 4  5  2  4 2  17  16   24               Ile                                                                              AUU 3  3  3  0 0  9   13 30 End                                                                              UAA -- -- -- --                                                                              -- 0   .sup. ND.sup.d                                                                ND                       AUC 7  8  9  2 4  31  15 23    UAG 1  -- -- --                                                                              -- 1   ND   ND                  AUA 0  1  4  0 2  7   0.4                                                                              5     UGA -- 1  1  1 1  4   ND   ND               Leu                                                                              UUA 0  1  0  0 0  1   2  14 fMet                                                                             AUG 1  1  1  --                                                                              1  4   ND   ND                  UUG 1  2  3  2 3  11  3  12    GUG -- -- -- 1 -- 1   ND   ND                  CUU 1  2  2  1 1  7   5  14                                                   CUC 4  7  5  3 4  24  6  13                                                   CUA 0  1  0  0 0  1   1  4                                                    CUG 5  9  14 9 10 48  66 56                                                __________________________________________________________________________     .sup.a Absolute codon usage for the subunit cistrons include the signal       peptides (see Table 2). The number of codons in the five individual           subunits are 269(S1), 227(S2), 228(S3), 132(S4), and 121(S5).                 .sup.b Data deduced from Grosjean and Fiers Gene 18:199, 1982. S =            strongly expressed genes; W = moderately to weakly expressed genes.           .sup.c Relative codon usage per 1000 codons. Pertussis usage based on 977     codons for the pertussis toxin gene (PTX). E. coli usage based on 5253        codons for highly expressed genes (S) and 5231 codons for moderate to         weakly expressed genes (W).                                                   .sup.d ND not determined.                                                

The assignment for S1 in the location shown in FIG. 4d is furthersupported by a significant homology of two regions in the S1 amino acidsequence with two related regions in the A subunits of both cholera andE. coli heat labile toxins. These homologous regions, shown in Table 4,may be part of functional domains for a catalytic activity in thesubunits for all three toxins. Furthermore, the assignment for S1, aswell as the correct prediction of the signal peptide cleavage site, issupported by preliminary amino acid sequence data for the mature protein(unpublished results).

Subunits S2 and S3 share 70% amino acid homology, which makes thecorrect assignment of these subunits to their ORFs difficult if it isbased only on the amino acid composition and the molecular weight.Nevertheless, the gene order could be determined as shown in FIG. 4dbased on the location of a Tn5-induced mutation responsible for the lackof active pertussis toxin in the supernatant of the mutant B. pertussisstrains. This Tn5 insertion was mapped 1.3 kb downstream of the startsite for the S4 subunit gene, as indicated by the arrow in FIG. 4a. Ascan be seen in FIG. 4, the Tn5-insertion in those mutants would belocated in the ORF for S3. Although unable to produce active pertussistoxin, the mutants are still able to produce the S2 subunit. Thus, theTn5-insertion in those mutants is not located in the structural gene forS2. Therefore, the ORFs for S2 and S3 could be differentiated.

Amino acid sequences

The amino acid sequence for each subunit was deduced from the nucleotidesequence and is shown in Table 2. The mature proteins contain 234 aminoacids for S1, 199 amino acids for S2, 110 amino acids for S4, 100 aminoacids for S5 and 199 amino acids for S3, in the order of the genearrangement from the 5'-end to the 3'-end. Most likely all subunitscontain signal peptides, as expected for secretory proteins. The lengthof the putative signal peptides was estimated after analysis of thehydrophobicity plot, the predicted secondary structure and applicationof von Heijne's rule for the prediction of the most probable signalpeptide cleavage site. The cleavage site for each subunit is shown inTable 2 by the asterisks. The correct prediction of the cleavage sitesfor S4 and S1 (unpublished) was confirmed by amino terminal sequencingof the purified mature subunits. The length of the signal peptidesvaries from 34 residues for S1, 28 residues for S3, and 27 residues forS2, to 21 residues for S4, and 20 residues for S5. All of the signalpeptides contain a positively-charged amino terminal region of variablelength, followed by a sequence of hydrophobic amino acids, usually inα-helical or partially α-helical, partially β-pleated conformation. Aless hydrophobic carboxy-terminal region follows, usually ending in aβ-turn conformation at the signal peptide cleavage site. All subunitsexcept S5 follow the -1, -3 rule, which positions the cleavage siteafter Ala-X-Ala. The amino-terminal charge for the subunit signalpeptides varies between +4 for S1 and +1 for S4 and S5. All describedproperties correspond very well to the general properties for bacterialsignal peptides.

Two different initiation codons are used for the translation of allsubunits in B. pertussis, i.e., the most frequently used ATG for S1, S2,S3 and S5, and the less frequently used GTG for S4. The codon usage(Table 4) is unsuitable for efficient translation of the pertussis toxingene in E. coli. This is reflected by the codon choice for frequentlyused amino acids, such as alanine, arginine, glycine, histidine, lysine,proline, serine and valine. Whether pertussis toxin is a strongly orweakly expressed protein in B. pertussis and whether this expression isregulated by the presence of a precise relative amount of the differenttRNA isoacceptors, possibly different from E. coli, remains to beestablished. This can be evaluated by in vitro translation using E. coliand B. pertussis cell free . extracts.

Closer examination of the amino acid sequence reveals the strikingabsence of lysines in S1. Another interesting feature is the overallrelatively high amount of cysteines as compared to E. coli proteins.Cysteines do not seem to be involved in inter-subunit links to constructthe quaternary structure of the toxin, since all subunits can be easilyseparated by SDS-PAGE in the absence of reducing agents. Most likely,the cysteines are involved in intrachain bonds, since reducing agentssignificantly change the electrophoretic mobility of all subunits butS4. Serines, threonines and tyrosines also are represented morefrequently than in average E. coli proteins. The hydroxyl groups ofthese residues may be involved in the quaternary structure throughhydrogen bonding.

Analysis of the flanking regions

Since all pertussis toxin subunits are closely linked and probablyexpressed in a very precise ratio, it is possible that they are arrangedin a polycistronic operon. A polycistronic arrangement for the subunitcistrons also has been described for other bacterial toxins bearingsimilar enzymatic functions, such as diphtheria, cholera and E. coliheat labile toxins. Therefore, the flanking regions were analyzed forthe presence of transcriptional signals. In the 5' flanking region,starting at position 469, the sequence TAAAATA was found, which matchessix of the seven nucleotides found in the ideal TATAATA Pribnow or -10box. An identical sequence can be found in several other bacterialpromotors, including the lambda L57 promotor. Given the fact that mosttranscripts start at a purine residue about 5-7 nucleotides downstreamfrom the Pribnow box, the transcriptional start site was tentativelylocated at the adenine residue at position 482. This residue is locatedin the sequence CAT, often found at transcriptional start sites.Upstream from the proposed -10 box, the sequence CTGACC starts atposition 442. This sequence matches four of the six nucleotides found inthe ideal E. coli -35 box TTGACA. The mismatching nucleotides in theproposed pertussis toxin -35 box are the two end nucleotides, of whichthe 3' residue is the less important nucleotide in the E. coli -35consensus box. A replacement of the T by a C in the first position ofthe consensus sequence can also be found in several E. coli promotors.The distance between the two proposed promotor boxes is 21 nucleotides,a distance of the same length has been found in the galPl promotor andin several plasmid promotors. The proposed -35 box is immediatelypreceded by two overlapping short inverted repeats with calculated freeenergies of -15.6 kcal and -8.6 kcal, respectively. Inverted repeats canalso be found at the 5'-end of the cholera toxin promotor. In bothcases, they may be involved in positive regulation of the toxinpromotors. None of the ORFs assigned to the other subunit is closelypreceded by a similar promotor-like structure. However, a differentpromotor-like structure was found associated with the S4 subunit ORF.

The 3'-flanking region has been examined for the presence of possibletranscriptional termination sites. Several inverted repeats could befound; the most significant is located in the region extending fromposition 4031 to 4089 and has a calculated free energy of -41.4 kcal.None of the inverted repeats are immediately followed by an oligo(dT)stretch, which may suggest that they function in a rho-dependentfashion. Preliminary experiments indicate, however, that neitherinverted repeat functions efficiently in E. coli (results not shown).Whether they are functional in B. pertussis remains to be establishedand can be investigated by a small deletion or site-directed mutagenesisexperiments, which are feasible now that the DNA sequence is known.Another possibility is that the five different subunits may not be theonly proteins encoded in the polycistronic operon and that cistrons forother peptides, possibly involved in regulation, assembly or transport,are cotranscribed. Non-structural proteins involved in theposttranslational processing of E. coli heat labile toxin have beenproposed. However, no significantly long ORF was found at the 3'-end ofthe nucleotide sequence shown in FIG. 4b. If other proteins are encodedby the same polycistronic operon, their coding regions must be locatedfurther downstream.

Additionally, the 5'-flanking region of each cistron was also examinedfor the presence of ribosomal binding sites. Neither the ribosomalbinding sequences for B. pertussis genes, nor the 3'-end sequence of the16 S rRNA are known. Therefore, the flanking regions could be comparedwith only the ribosomal binding sequences of heterologous procaryoticorganisms represented by the Shine-Dalgarno sequence. Preceding the S1initiation codon, the sequence GGGGAAG was found starting at position495. This sequence shares four out of seven nucleotides with the idealShine-Dalgarno sequence AAGGAGG. The two first mismatching nucleotidesin the pertussis toxin gene would not destabilize the hybridizatin tothe 3'-end of the E. coli 16 S rRNA. This putative ribosomal bindingsite is close enough to the initiation codon for S1 to be functional inE. coli. Another possible Shine-Dalgarno sequence overlaps the first oneand also matches four out of seven nucleotides to the consensussequence. The mismatching nucleotides, however, have a moredestabilizing effect than the ones found in the first sequence. The S2subunit ORF is not closely preceded by a ribosomal binding sequence,which may suggest that S2 is translated through a mechanism notinvolving the detachment and reattachment of the ribosome between thecoding regions for S1 and S2. The short distance between the S1 and S2cistrons, and the absence of a ribosomal binding site are characteristicof this mechanism. A ribosomal binding site for S4 in the sequenceCAGGGCGGC, starting at position 2066 is possible. The ORF for S5 ispreceded by the sequence AAGGCG, starting at position 2485, whichmatches five out of six nucleotides in the consensus sequence AAGGAG.Finally, S3 is preceded by the sequence GGGAACAC, which is very similarto the proposed ribosomal binding site for S1, i.e., GGGAAGAC.

Taken as a whole, the results described herein clearly establish thecomplete nucleotide sequence of all structural cistrons for pertussistoxin. The gene order, as shown in FIG. 4, is S1, S2, S4, S5, and S3.The calculated molecular weights from the deduced sequence of the maturepeptides are 26,024 for S1; 21,924 for S2; 12,058 for S4; 11,013 for S5and 21,873 for S3. Since S4 is present in two copies per toxin molecule,the total molecular weight for the holotoxin is about 104,950. This isin agreement with the apparent molecular weight estimated bynon-denaturing PAGE. The most striking feature of the predicted peptidesequences is the high homology between S2 and S3. The two peptides share70% amino acid homology and 75% nucleotide homology. This suggests thatboth cistrons were generated through a duplication of an ancestralcistron followed by mutations which result in functionally-differentpeptides. The differences between S2 and S3 are scattered throughout thewhole sequence and are slightly more frequent in the amino-terminal halfof the peptides. Despite their high homology, also reflected in thepredicted secondary structures and hydrophilicities, S2 and S3 subunitscannot substitute for each other in the functionally-active pertussistoxin. The comparison between the two subunits may be useful inlocalizing their functional domains in relation to their primary,secondary and tertiary structure. On the basis of the differences, S2and S3 are divided into two domains, the amino-terminal and thecarboxy-terminal. Each of the subunits binds to a S4 subunit. Thisfunction could be located in the more conserved carboxy-terminal domainsof S2 and S3. The two resulting dimers are thought to bind to one S5subunit. This function could be assigned to the more divergentamino-terminal domains of S2 and S3. Alternatively, it is possible thatthe dimers bind to the S5 subunit through S4 and that the amino-terminaldomains of S2 and S3 are involved in some other function, possibly theinteraction of the binding moiety (S2 through S5) with theenzymatically-active moiety (S1).

The enzymatically-active S1 subunit was compared to the A subunits ofother bacterial toxins. Two regions with significant homology to choleraand E. coli heat labile toxins were found (Table 4). They are tandemlylocated in analogous regions of all three toxins. However, the threeamino acid differences found in these regions cannot be explained bysingle base pair changes in the DNA. Furthermore, in most cases thehomologous amino acids use quite different codons in pertussis toxincompared to cholera and E. coli heat labile toxins. This, together withthe fact that no other significant homology in the primary structurecould be found and that the amino acid sequences of the other subunitsare completely different from the sequence of any other ADP-ribosylatingtoxin, strongly suggests that pertussis toxin is not evolutionarilyrelated to any of the other known bacterial toxins. The limited homologyof S1 subunit to the A subunits of cholera and E. coli heat labiletoxins could be due to convergent evolution, since all three toxinscontain a very similar enzymatic activity and use a relativelyclosely-related accepter substrate (Ni protein for pertussis toxin andNs protein for cholera and E. coli heat labile toxins). The NAD-bindingsite for the two enterotoxins has been identified at thecarboxy-terminal region of their A1 subunit. No significant homologycould be found between the carboxy-terminal of the enterotoxins, nor anyother NAD-binding enzymes, and the analogous region in the S1 subunit.This suggests that the NAD-binding function of the ADP-ribosylatingenzymes is dependent more on the secondary or tertiary structures, thanon the primary structures. It is proposed that the twoenzymatically-active domains lie in different regions of the protein,one at the amino-terminal half of the subunit for the acceptor substrate(Ni) binding and the other at the carboxy-terminal half of the subunitfor the donor substrate (NAD⁺) binding.

The presence of a promotor-like structure upstream of the S1 subunitcistron and possible transcriptional termination signals downstream ofthe S3 subunit cistron suggests that pertussis toxin, like many otherbacterial toxins, is expressed through a polycistronic mRNA. Theinverted repeats immediately preceding the proposed promotor may besites for positive regulation of expression of the toxin in B.pertussis. Evidence for a positive regulation came through the discoveryof the vir gene, the product of which is essential for the production ofmany virulence factors, including pertussis toxin. Recent evidence inour laboratory suggests that the proposed inverted repeats in the 3'flanking region are not very efficient in transcriptional termination inE. coli (results not shown). The termination of transcription in B.pertussis may be carried out by a slightly different mechanism than inE. coli; on the other hand, the polycistron may contain other, not yetidentified, genes related to expression of functionally-active pertussistoxin or other virulence factors. We have described a promotor-likestructure preceding subunit S4 and possible termination signalsfollowing the S4 cistron. The S4 promotor-like structure is quitedifferent from the proposed promotor at the beginning of S1 subunit. Itis part of an inverted repeat, suggesting an iron regulation of the S4subunit expression. This is supported by the fact that chelating agentsstimulate the accumulation of active pertussis toxin in cellsupernatants. It is thus possible that pertussis toxin is expressedefficiently by two dissimilar promotors, one (promotor 1) located in the5'-flanking region and the other (promotor 2) located upstream of S4.Both promotors would be regulated by different mechanisms. Promotor 1would be positively regulated, possibly by the vir gene product, andpromotor 2 would be negatively regulated by the presence of iron. Inoptimal expression conditions, such as in the presence of the vir geneproduct and in the absence of iron, the S4 subunit cistron would betranscribed twice for every transcription of the other subunits. This isa mechanism that would explain the stoichiometry of the pertussis toxinsubunits of 1:1:1:2:1 for S1:S2:S3:S4:S5, respectively, in thebiologically active holotoxin.

Attempts to express the pertussis toxin gene in E. coli have beenheretofore unsuccessful, although very sensitive monoclonal andpolyclonal antibodies are available. This lack of expression in E. colimay reside in the fact that B. pertussis promotors are not efficientlyrecognized by the E. coli RNA polymerase. Analysis of the promotor-likestructures of the pertussis toxin gene and their comparison to strong E.coli promotors show very significant differences, indeed, of which themost striking ones are the unusual distances between the proposed -35and -10 boxes in the pertussis toxin promotors. The distance betweenthose two boxes in strong E. coli promotors is around 17 nucleotides,whereas the distances in the two putative pertussis toxin promotors are21 nucleotides for the polycistronic promotor and 10 nucleotides for theS4 subunit promotor. Preliminary results in our laboratory usingexpression vectors designed to detect heterologous expression signalswhich are able to function in E. coli further indicate that B. pertussispromotors may not be recognized by the E. coli expression machinery. Inaddition, the codon usage for pertussis toxin is extremely inefficientfor translation in E. coli (Table 5). Preliminary experiments show thatthe insertion of a fused lac/trp promotor in the KpnI site upstream ofthe pertussis toxin operon probably enhances transcription but does notproduce detectable levels of pertussis toxin (unpublished results).Efficient expression in E. coli would require resynthesis of thepertussis toxin operon, respecting the optimal codon usage for E. coli.It is not known whether the codon usage for pertussis toxin reflects theoptimal codon usage for expression in B. pertussis, since no other B.pertussis gene has heretofore been sequenced.

The cloned and sequenced pertussis toxin genes are useful for thedevelopment of an efficient and safer vaccine against whooping cough. Bycomparison to other toxin genes with similar biochemical functions andby physical identification of the active sites either for theADP-ribosylation in the S1 subunit or the target cell binding insubunits S2 through S4, it is now possible to modify those sites bysite-directed mutagenesis of the B. pertussis genome. Thesemodifications could abolish the pathobiological activities of pertussistoxin without hampering its immunogenicity and protectivity.Alternatively, knowing the DNA sequence, mapping of eventual protectiveepitopes is now made possible. Synthetic oligopeptides comprising thoseepitopes will also be useful in the development of a new generationvaccine.

It is understood that the examples and embodiments described herein arefor illustrative purposes only and that various modifications or changesin light thereof will be suggested to persons skilled in the art and areto be included within the spirit and purview of this application and thescope of the appended claims. The theories suggested herein are onlyreasonable explanations based on the current knowledge and facts and arepropounded without in any manner being bound to them.

We claim:
 1. An isolated gene consisting essentially of DNA encodingpertussis toxin.
 2. A recombinant DNA vector containing the gene ofclaim
 1. 3. An E. coli containing the recombinant DNA vector of claim 2.